Assignment 3 - Data Analysis

Author

Niranjan Krishnan Devaraj

Data Analysis Project

Hey! In this project, we’re going to be exploring the Housing Price Index (HPI) for United States, in the context of their Economic Indicators, S&P 500 Closing Values, Air Quality Index, and Geographical Locations. The data comes from several datasets, such as the Federal Housing Finance Agency, United State Environmental Protection Agency, International Monetary Fund, Yahoo Finance, and Google Maps API. This builds up on work done previously as part of Assignment 1, where HPI was analysed in the context of the US Economy.

This is the dataflow diagram of how the final dataset on which we’re operating was formed. This process was done through several distinct processes on R, scripts for which are available on the associated GitHub Repo. This final dataset was then saved, and all operations thereafter was directly performed on this dataset.

graph TD;
    A[Combine AQI Data] --> B[combined_aqi_data.csv];
    B --> C[Combine with HPI_master.csv];
    C --> D[Combine with USIndicatorsEdited.csv];
    D --> E[Fetch GSPC Closing Prices];
    D --> F[Fetch Coordinates using Google Maps API];
    B --> F;
    C --> F;
    E --> G[Final Dataset];
    F --> G;

This diagram outlines our step-by-step approach to compiling a robust dataset for analysis. We start by bringing together air quality index data, then combine it with housing price indices and economic indicators for the US. Next, we incorporate data on GSPC closing prices and geographical coordinates. This comprehensive dataset enables us to explore correlations between air quality, housing prices, economic factors, and geographic locations, providing valuable insights for our analysis.

Libraries Used

#|echo: false
#|warnings: false
#|output: false

library(htmlwidgets)
library(corrplot)
corrplot 0.92 loaded
library(tidyr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(MASS)

Attaching package: 'MASS'
The following object is masked from 'package:dplyr':

    select
library(car)
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
library(ggplot2)
library(plotly)
Warning: package 'plotly' was built under R version 4.3.3

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:MASS':

    select
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
library(knitr)
library(kableExtra)
Warning: package 'kableExtra' was built under R version 4.3.3

Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':

    group_rows

Data Read Operation

# Read files into a dataframe
final_data <- read.csv("C:/Users/ninja/OneDrive/Documents/GitHub/analytics-assignment3/final_data1.csv")
final_data_clean <- read.csv("C:/Users/ninja/OneDrive/Documents/GitHub/analytics-assignment3/final_data_clean_VF.csv")
final_data_averaged <- read.csv("C:/Users/ninja/OneDrive/Documents/GitHub/analytics-assignment3/final_data_averaged.csv")

Quantitative Analyses

Correlation Analysis

# Calculate correlation matrix
cor_matrix <- cor(final_data_clean[, sapply(final_data_clean, is.numeric)])

# Plot correlation matrix as a heatmap
corrplot(cor_matrix, method = "color", type = "upper", order = "hclust",          tl.col = "black", tl.srt = 45, tl.cex = 0.35)

# Find variables with strong correlations
strong_correlations <- colnames(cor_matrix)[rowSums(abs(cor_matrix) > 0.7) > 1]

# Print the variables with strong correlations  
print(strong_correlations)
 [1] "Year"                                                                              
 [2] "Moderate.Days"                                                                     
 [3] "Unhealthy.for.Sensitive.Groups.Days"                                               
 [4] "Unhealthy.Days"                                                                    
 [5] "X90th.Percentile.AQI"                                                              
 [6] "Median.AQI"                                                                        
 [7] "index_nsa"                                                                         
 [8] "GSPC.Close"                                                                        
 [9] "Gross.domestic.product..constant.prices"                                           
[10] "Gross.domestic.product.per.capita..constant.prices"                                
[11] "Gross.domestic.product.per.capita..current.prices"                                 
[12] "Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total"
[13] "Volume.of.imports.of.goods.and.services"                                           
[14] "Volume.of.exports.of.goods.and.services"                                           
[15] "group"                                                                             

When multiple variables in a dataset are strongly correlated, it creates a tricky situation for statistical analysis called multicollinearity. This essentially means that it’s hard to tell how each individual variable contributes to the outcome we’re interested in, whether it’s air quality or economic factors like GDP. With such tangled relationships, the results of our analysis become less reliable. It’s like trying to separate threads that are all tightly woven together. This can lead to inflated errors and confusing interpretations of our findings. To tackle multicollinearity, we need to carefully choose which variables to include in our analysis, transform the data if needed, or use specialized techniques to untangle the relationships between variables. Doing so ensures that our models are accurate and trustworthy, allowing us to draw meaningful conclusions from our data.

Multiple Regression with Step Evaluation and Multicollinearity Tests

Using stepwise regression with the stepAIC (Akaike Information Criterion) method allows for automated variable selection by iteratively adding or removing predictors to optimize model fit while penalizing for model complexity, enhancing interpretability.

# Remove CBSA, Year, and group columns
data_subset <- final_data_clean %>%
  select(-CBSA, -Year, -group)

# Perform stepwise regression
stepwise_model <- stepAIC(lm(index_nsa ~ ., data = data_subset), direction = "both")

Multicollinearity

We’re now going to removing multicollinear variables from the model trained by stepAIC. We’ve identified a list of variables, and through trial and error, we’ve been able to reduce the effect of multicollinearity in the model.

# Step 2: Check for multicollinearity
vif_values <- vif(stepwise_model)
print(vif_values)
                                                                     Days.with.AQI 
                                                                          3.326653 
                                                                     Moderate.Days 
                                                                          5.754717 
                                               Unhealthy.for.Sensitive.Groups.Days 
                                                                          5.529663 
                                                                    Unhealthy.Days 
                                                                          4.668829 
                                                                           Max.AQI 
                                                                          1.564234 
                                                              X90th.Percentile.AQI 
                                                                          7.933300 
                                                                        Median.AQI 
                                                                          7.522855 
                                                                           Days.CO 
                                                                          1.602317 
                                                                          Days.NO2 
                                                                          1.573584 
                                                                        Days.Ozone 
                                                                          2.352098 
                                                                        GSPC.Close 
                                                                         22.513277 
                                           Gross.domestic.product..constant.prices 
                                                                          9.546861 
                                Gross.domestic.product.per.capita..constant.prices 
                                                                        142.623370 
                                 Gross.domestic.product.per.capita..current.prices 
                                                                        175.013429 
Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total 
                                                                         38.509708 
                                                Inflation..average.consumer.prices 
                                                                          4.260568 
                                           Volume.of.imports.of.goods.and.services 
                                                                         11.107569 
                                           Volume.of.exports.of.goods.and.services 
                                                                          5.481977 
                                                                 Unemployment.rate 
                                                                          4.351870 
                                                           Current.account.balance 
                                                                          7.891527 
                                                                       lon_numeric 
                                                                          1.103001 
                                                                       lat_numeric 
                                                                          1.104620 
# Set a threshold for VIF values
threshold <- 10

# Identify variables with VIF above the threshold
high_collinearity_vars <- names(vif_values)[vif_values > threshold]

print(high_collinearity_vars)
[1] "GSPC.Close"                                                                        
[2] "Gross.domestic.product.per.capita..constant.prices"                                
[3] "Gross.domestic.product.per.capita..current.prices"                                 
[4] "Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total"
[5] "Volume.of.imports.of.goods.and.services"                                           
# Remove variables with high collinearity from the model
final_model <- update(stepwise_model, . ~ . - 
                        Gross.domestic.product.per.capita..constant.prices - GSPC.Close - 
                        Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total )

# Check for multicollinearity again
vif_values <- vif(final_model)
print(vif_values)
                                    Days.with.AQI 
                                         3.134972 
                                    Moderate.Days 
                                         5.749523 
              Unhealthy.for.Sensitive.Groups.Days 
                                         5.515527 
                                   Unhealthy.Days 
                                         4.665915 
                                          Max.AQI 
                                         1.556075 
                             X90th.Percentile.AQI 
                                         7.775680 
                                       Median.AQI 
                                         7.477114 
                                          Days.CO 
                                         1.534515 
                                         Days.NO2 
                                         1.509965 
                                       Days.Ozone 
                                         2.327407 
          Gross.domestic.product..constant.prices 
                                         9.097613 
Gross.domestic.product.per.capita..current.prices 
                                         2.287780 
               Inflation..average.consumer.prices 
                                         1.764991 
          Volume.of.imports.of.goods.and.services 
                                         9.859812 
          Volume.of.exports.of.goods.and.services 
                                         3.046563 
                                Unemployment.rate 
                                         1.892476 
                          Current.account.balance 
                                         1.288833 
                                      lon_numeric 
                                         1.102851 
                                      lat_numeric 
                                         1.100786 
# Set a threshold for VIF values
threshold <- 10

# Identify variables with VIF above the threshold
high_collinearity_vars <- names(vif_values)[vif_values > threshold]

print(high_collinearity_vars)
character(0)
# Summary of the updated final model
summary(final_model)

Call:
lm(formula = index_nsa ~ Days.with.AQI + Moderate.Days + Unhealthy.for.Sensitive.Groups.Days + 
    Unhealthy.Days + Max.AQI + X90th.Percentile.AQI + Median.AQI + 
    Days.CO + Days.NO2 + Days.Ozone + Gross.domestic.product..constant.prices + 
    Gross.domestic.product.per.capita..current.prices + Inflation..average.consumer.prices + 
    Volume.of.imports.of.goods.and.services + Volume.of.exports.of.goods.and.services + 
    Unemployment.rate + Current.account.balance + lon_numeric + 
    lat_numeric, data = data_subset)

Residuals:
    Min      1Q  Median      3Q     Max 
-147.53  -23.08   -2.71   14.87  457.04 

Coefficients:
                                                    Estimate Std. Error t value
(Intercept)                                       -5.839e+01  2.738e+00 -21.323
Days.with.AQI                                     -8.665e-02  3.860e-03 -22.449
Moderate.Days                                      1.378e-01  7.061e-03  19.520
Unhealthy.for.Sensitive.Groups.Days                3.055e-02  2.344e-02   1.303
Unhealthy.Days                                     1.769e-01  3.865e-02   4.577
Max.AQI                                            2.636e-02  2.877e-03   9.162
X90th.Percentile.AQI                               6.001e-02  1.725e-02   3.478
Median.AQI                                        -6.531e-01  3.804e-02 -17.168
Days.CO                                            7.598e-02  5.859e-03  12.968
Days.NO2                                           1.458e-01  5.138e-03  28.384
Days.Ozone                                         8.508e-02  2.855e-03  29.799
Gross.domestic.product..constant.prices           -2.834e-01  3.041e-01  -0.932
Gross.domestic.product.per.capita..current.prices  3.979e-03  1.912e-05 208.074
Inflation..average.consumer.prices                 6.963e+00  1.698e-01  41.003
Volume.of.imports.of.goods.and.services           -5.120e-01  9.243e-02  -5.539
Volume.of.exports.of.goods.and.services           -2.897e-01  5.727e-02  -5.058
Unemployment.rate                                 -2.610e-01  1.526e-01  -1.710
Current.account.balance                           -5.656e+00  1.473e-01 -38.411
lon_numeric                                       -5.119e-01  1.185e-02 -43.203
lat_numeric                                       -3.849e-01  3.481e-02 -11.057
                                                  Pr(>|t|)    
(Intercept)                                        < 2e-16 ***
Days.with.AQI                                      < 2e-16 ***
Moderate.Days                                      < 2e-16 ***
Unhealthy.for.Sensitive.Groups.Days               0.192564    
Unhealthy.Days                                    4.73e-06 ***
Max.AQI                                            < 2e-16 ***
X90th.Percentile.AQI                              0.000505 ***
Median.AQI                                         < 2e-16 ***
Days.CO                                            < 2e-16 ***
Days.NO2                                           < 2e-16 ***
Days.Ozone                                         < 2e-16 ***
Gross.domestic.product..constant.prices           0.351425    
Gross.domestic.product.per.capita..current.prices  < 2e-16 ***
Inflation..average.consumer.prices                 < 2e-16 ***
Volume.of.imports.of.goods.and.services           3.05e-08 ***
Volume.of.exports.of.goods.and.services           4.25e-07 ***
Unemployment.rate                                 0.087269 .  
Current.account.balance                            < 2e-16 ***
lon_numeric                                        < 2e-16 ***
lat_numeric                                        < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 40.24 on 48501 degrees of freedom
Multiple R-squared:  0.6903,    Adjusted R-squared:  0.6902 
F-statistic:  5690 on 19 and 48501 DF,  p-value: < 2.2e-16

This multiple regression model shows a strong overall fit with an Adjusted R-squared value of 0.6902, indicating that approximately 69.02% of the variance in the dependent variable (index_nsa) is explained by the independent variables included in the model.

Several predictors exhibit significant relationships with the dependent variable, including Days.with.AQI, Moderate.Days, Max.AQI, Median.AQI, Days.CO, Days.NO2, Days.Ozone, Inflation, Volume.of.imports.of.goods.and.services, Volume.of.exports.of.goods.and.services, Unemployment.rate, Current.account.balance, lon_numeric, and lat_numeric, as indicated by their low p-values (p < 0.05).

However, some predictors such as Unhealthy.for.Sensitive.Groups.Days and Gross.domestic.product..constant.prices do not show statistically significant associations with the dependent variable, suggesting their limited contribution to the model’s predictive power.

The relationships we’ve uncovered between various factors and our outcomes, like the Housing Price Index (HPI) or air quality index (AQI), provide us with some fascinating insights. For instance, when we look at metrics like the number of days with poor air quality or the severity of pollution, we get a clear picture of how it impacts housing prices and people’s overall well-being. On the economic side, indicators like inflation rates, trade volumes, and unemployment rates give us clues about how economic conditions shape housing markets.

What’s particularly interesting is that some factors, like days when the air quality is unhealthy for sensitive groups or GDP at constant prices, don’t seem to have a big impact on our outcomes. This tells us that while they’re important, there are other factors, like pollutant concentrations and broader economic indicators, that play a bigger role in influencing housing prices and air quality. These insights are crucial for policymakers and city planners as they work to address housing affordability and environmental concerns in our communities.

The residual standard error of 40.24 indicates the average deviation of observed values from the fitted values, providing a measure of the model’s goodness of fit.

Qualitative Analyses

Geospatial Data Visualization to Discover Patterns and Relationships

Geospatial data visualization is a powerful technique used to uncover patterns, trends, and relationships within spatially referenced data. By mapping data onto geographical coordinates, we are able to visualize complex spatial distributions, identify clusters, and understand spatial interactions between variables.

#index_nsa

# Define a function to create Leaflet map with custom popup and heatmap
create_leaflet_map <- function(data) {
  pal <- colorQuantile(c("green", "red"), domain = data$index_nsa, n = 5)  # Define color palette
  
  leaflet(data) %>%
    addTiles() %>%
    addCircleMarkers(
      lng = ~lon_numeric,
      lat = ~lat_numeric,
      radius = 7,
      color = ~pal(index_nsa),  # Use color palette based on index_nsa values
      fillOpacity = 0.7,
      popup = paste(
        "<b>CBSA:</b> ", data$CBSA, "<br>",
        "<b>Year:</b> ", data$Year, "<br>",
        "<b>Good Days:</b> ", data$Good.Days, "<br>",
        "<b>Max AQI:</b> ", data$Max.AQI, "<br>",
        "<b>Index NSA:</b> ", data$index_nsa
      ),
      label = ~substr(data$CBSA, 1, 20), # Shorten CBSA name for label
      labelOptions = labelOptions(noHide = FALSE)
    ) %>%
    addLegend(
      position = "bottomright",
      pal = pal,
      values = ~index_nsa,
      title = "Index NSA"
    )
}

# Call the function for each group
maps <- list()

for (i in unique(final_data_averaged$group)) {
  data <- filter(final_data_averaged, group == i)
  map <- create_leaflet_map(data)
  title <- htmltools::h2(paste("Housing Price Index (Heatmap) across a Geographic Region - Group ", i),
                         style = "font-family: Arial; font-size: 18px; color: Black; text-align: center;")
  maps[[i]] <- prependContent(map, title)
}
maps[[1]]

Housing Price Index (Heatmap) across a Geographic Region - Group 1

maps[[2]]

Housing Price Index (Heatmap) across a Geographic Region - Group 2

maps[[3]]

Housing Price Index (Heatmap) across a Geographic Region - Group 3

maps[[4]]

Housing Price Index (Heatmap) across a Geographic Region - Group 4

maps[[5]]

Housing Price Index (Heatmap) across a Geographic Region - Group 5

maps[[6]]

Housing Price Index (Heatmap) across a Geographic Region - Group 6

maps[[7]]

Housing Price Index (Heatmap) across a Geographic Region - Group 7

In our exploration of housing trends, we’ve uncovered a concerning pattern in the western part of the USA: steadily increasing housing costs. What’s driving this phenomenon? Well, it’s a mix of factors. Take a look at cities like San Francisco, Los Angeles, Seattle, and Denver. They’re buzzing with economic opportunities, drawing people in with promises of thriving job markets and enviable lifestyles.

But with all this growth comes a downside: a surge in demand for housing that surpasses what’s available. This mismatch between supply and demand has pushed housing prices through the roof, making affordable options scarce. And it’s not just about people wanting to move in – regulations on zoning and land use, coupled with limited space for development, have tightened the squeeze even further.

Adding to the mix are speculative real estate ventures and foreign investments pouring into urban areas, driving prices up even more. It’s a tough situation for folks, especially those with modest incomes, who find themselves grappling with the challenge of finding housing that fits their budget.

#aqi

# Define a function to create Leaflet map with custom popup and heatmap
create_leaflet_map2 <- function(data) {
  pal <- colorQuantile(c("green", "red"), domain = data$Max.AQI, n = 5)  # Define color palette
  
  leaflet(data) %>%
    addTiles() %>%
    addCircleMarkers(
      lng = ~lon_numeric,
      lat = ~lat_numeric,
      radius = 7,
      color = ~pal(index_nsa),  # Use color palette based on index_nsa values
      fillOpacity = 0.7,
      popup = paste(
        "<b>CBSA:</b> ", data$CBSA, "<br>",
        "<b>Year:</b> ", data$Year, "<br>",
        "<b>Good Days:</b> ", data$Good.Days, "<br>",
        "<b>Max AQI:</b> ", data$Max.AQI, "<br>",
        "<b>Index NSA:</b> ", data$index_nsa
      ),
      label = ~substr(data$CBSA, 1, 20), # Shorten CBSA name for label
      labelOptions = labelOptions(noHide = FALSE)
    ) %>%
    addLegend(
      position = "bottomright",
      pal = pal,
      values = ~Max.AQI,
      title = "Average Max AQI"
    )
}

# Call the function for each group
maps2 <- list()
for (i in unique(final_data_averaged$group)) {
  data <- filter(final_data_averaged, group == i)
  map <- create_leaflet_map2(data)
  title <- htmltools::h2(paste("Air Quality Index (Heatmap) across a Geographic region - Group ", i),
                         style = "font-family: Arial; font-size: 18px; color: black; text-align: center;")
  maps2[[i]] <- htmlwidgets::prependContent(map, title)
}
maps2[[1]]

Air Quality Index (Heatmap) across a Geographic region - Group 1

maps2[[2]]

Air Quality Index (Heatmap) across a Geographic region - Group 2

maps2[[3]]

Air Quality Index (Heatmap) across a Geographic region - Group 3

maps2[[4]]

Air Quality Index (Heatmap) across a Geographic region - Group 4

maps2[[5]]

Air Quality Index (Heatmap) across a Geographic region - Group 5

maps2[[6]]

Air Quality Index (Heatmap) across a Geographic region - Group 6

maps2[[7]]

Air Quality Index (Heatmap) across a Geographic region - Group 7

From 1990 to 2023, the air quality index (AQI) in the USA has shown a concerning trend. Initially, there were positive outcomes as environmental regulations and technological advancements led to improvements in AQI. However, challenges emerged with the continued growth of urbanization, industrialization, and population density, particularly along borders and in densely populated areas.

Increased vehicular traffic, industrial emissions, and energy consumption have contributed to higher levels of air pollution, gradually deteriorating the AQI. Geographical factors such as proximity to major transportation routes and industrial zones have exacerbated air quality issues.

As pollution levels intensified, especially in heavily populated areas, addressing air quality concerns became increasingly urgent. Comprehensive regulatory measures, technological innovations, and public awareness campaigns are needed to mitigate the impact of air pollution on public health and the environment.

# Define the list of variables to include in the subset
variables_to_include <- c(
  "GSPC.Close",
  "Gross.domestic.product..constant.prices",
  "Gross.domestic.product.per.capita..constant.prices",
  "Gross.domestic.product.per.capita..current.prices",
  "Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total",
  "Inflation..average.consumer.prices",
  "Volume.of.imports.of.goods.and.services",
  "Volume.of.exports.of.goods.and.services",
  "Unemployment.rate"
)

# Create the subset
subset_data <- final_data_averaged %>%
  select(Year,group, all_of(variables_to_include))

# Filter subset_data for the specified years
filtered_data <- subset_data %>%
  filter(Year %in% c(1991, 1996, 2001, 2006, 2011, 2016, 2021, 2023)) %>%
  distinct(Year, .keep_all = TRUE)

# Create a vector of years to highlight
highlight_years <- c(1991, 1996, 2001, 2006, 2011, 2016, 2021, 2023)

# Highlight the rows corresponding to the specified years
filtered_data$Year_highlight <- ifelse(filtered_data$Year %in% highlight_years, "background-color: #FFFF00", "")

# Display the formatted table
filtered_data %>%
  arrange(Year) %>%
  select(-Year_highlight) %>%
  kable(caption = "Filtered Data for Selected Years", align = "c") %>%
  kable_styling(full_width = FALSE) %>%
  row_spec(row = which(filtered_data$Year %in% highlight_years), background = "#FFFF00")
Filtered Data for Selected Years
Year group GSPC.Close Gross.domestic.product..constant.prices Gross.domestic.product.per.capita..constant.prices Gross.domestic.product.per.capita..current.prices Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total Inflation..average.consumer.prices Volume.of.imports.of.goods.and.services Volume.of.exports.of.goods.and.services Unemployment.rate
1996 1 493.5946 2.775167 38861.29 27058.73 19.94267 3.094333 7.355167 7.349167 6.391667
2001 2 1186.8270 3.750800 44831.24 34436.85 20.11600 2.452400 9.457800 4.391600 4.473600
2006 3 1126.2335 2.922000 49032.39 41848.77 19.22820 2.630000 6.537400 5.237000 5.401600
2011 4 1205.7735 0.758400 50824.69 48422.15 17.10500 2.228600 1.159000 5.259400 7.648200
2016 5 1843.3650 2.157200 53219.21 54927.09 16.07100 1.308400 3.091800 2.309000 6.348200
2021 6 3144.5455 2.132200 57060.47 64296.18 15.79400 2.463400 3.007000 0.071800 5.078400
2023 7 4084.0928 2.071857 60417.07 78087.18 15.48829 6.313714 3.405571 4.720000 3.610714

This table provides insights into various aspects of US society, offering a qualitative analysis of trends observed over the years.

Regarding stock market performance, the data shows a consistent upward trend in the S&P 500 index from 1996 to 2023, indicating overall growth and stability in the stock market.

Economic growth is reflected in the increasing trend of Gross Domestic Product (GDP) measures. Both GDP per capita at constant and current prices show steady growth over time, suggesting improvements in living standards and economic prosperity.

Inflation rates fluctuate over the years but generally remain within acceptable ranges, indicating stable economic conditions despite variations in consumer price levels.

Trade volume, represented by the volume of imports and exports, demonstrates fluctuations influenced by global economic conditions, trade policies, and exchange rates.

Lastly, the unemployment rate shows variations over time, influenced by economic growth, business cycles, and government policies, highlighting the dynamic nature of labor market conditions.

Overall, these metrics provide valuable insights into the economic landscape of the United States, offering a nuanced understanding of trends and patterns shaping various aspects of society.

In summary, this project comprehensively analyzed the United States’ economic indicators, housing prices, and air quality trends. By employing advanced statistical techniques and geospatial visualization, it uncovered significant correlations, identified key trends, and provided valuable insights into the complex dynamics shaping societal and environmental factors. These findings have the potential to contribute to a deeper understanding of regional disparities and inform evidence-based strategies for addressing housing affordability in the United States.